Search CORE

1,310 research outputs found

Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks

Author: Gupta Tanmay
Hoiem Derek
Shih Kevin
Singh Saurabh
Publication venue
Publication date: 16/10/2017
Field of study

An important goal of computer vision is to build systems that learn visual representations over time that can be applied to many tasks. In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning. In particular, the task of visual recognition is aligned to the task of visual question answering by forcing each to use the same word-region embeddings. We show this leads to greater inductive transfer from recognition to VQA than standard multitask learning. Visual recognition also improves, especially for categories that have relatively few recognition training labels but appear often in the VQA setting. Thus, our paper takes a small step towards creating more general vision systems by showing the benefit of interpretable, flexible, and trainable core representations.Comment: Accepted in ICCV 2017. The arxiv version has an extra analysis on correlation with human attentio

arXiv.org e-Print Archive

Crossref

Recommended from our members

Information Brokers: A Comparison of the Web Browser Choices between Internet Users in the US and China

Author: Shih Kevin Jer-Kang
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

By treating web browsers as information brokers, this dissertation found that the rise of Google Chrome in China and the United States (two countries with vastly different regulations) is contingent on Google and its competitors’ cultural reputations (as suggested by previous research). This dissertation also found that Chrome’s popularity in the US and China is affected by how it is connected to other market entities and popular web services. By examining how a popularly utilized tool is institutionalized in two different countries, this dissertation articulates a new theoretical framework—by combining the sociology of consumption and social network theory—that is more suited to studying online platforms that broker content for internet users

eScholarship - University of California

Walverine: A Walrasian Trading Agent

Author: Daniel M. Reeves
Evan Leung
Kevin M. Lochner
Kevin O'Malley
L. Julian Schvartzman
Michael P. Wellman
Shih-Fen Cheng
Publication venue
Publication date
Field of study

TAC-02 was the third in a series of Trading Agent Competition events fostering research in automating trading strategies by showcasing alternate approaches in an open-invitation market game. TAC presents a challenging travel-shopping scenario where agents must satisfy client preferences for complementary and substitutable goods by interacting through a variety of market types. Michigan's entry, Walverine, bases its decisions on a competitive (Walrasian) analysis of the TAC travel economy. Using this Walrasian model, we construct a decision-theoretic formulation of the optimal bidding problem, which Walverine solves in each round of bidding for each good. Walverine's optimal bidding approach, as well as several other features of its overall strategy, are potentially applicable in a broad class of trading environments.trading agent, trading competition, tatonnement, competitive equilibrium

Research Papers in Economics

Collecting The Puzzle Pieces: Disentangled Self-Driven Human Pose Transfer by Permuting Textures

Author: Li Nannan
Plummer Bryan A.
Shih Kevin J.
Publication venue
Publication date: 30/08/2023
Field of study

Human pose transfer synthesizes new view(s) of a person for a given pose. Recent work achieves this via self-reconstruction, which disentangles a person's pose and texture information by breaking the person down into parts, then recombines them for reconstruction. However, part-level disentanglement preserves some pose information that can create unwanted artifacts. In this paper, we propose Pose Transfer by Permuting Textures (PT

^2

), an approach for self-driven human pose transfer that disentangles pose from texture at the patch-level. Specifically, we remove pose from an input image by permuting image patches so only texture information remains. Then we reconstruct the input image by sampling from the permuted textures for patch-level disentanglement. To reduce noise and recover clothing shape information from the permuted patches, we employ encoders with multiple kernel sizes in a triple branch network. On DeepFashion and Market-1501, PT

^2

reports significant gains on automatic metrics over other self-driven methods, and even outperforms some fully-supervised methods. A user study also reports images generated by our method are preferred in 68% of cases over self-driven approaches from prior work. Code is available at https://github.com/NannanLi999/pt_square.Comment: Accepted to ICCV 202

arXiv.org e-Print Archive